Variational Autoencoder for Deep Learning of Images, Labels and Captions
نویسندگان
چکیده
A novel variational autoencoder is developed to model images, as well as associated labels or captions. The Deep Generative Deconvolutional Network (DGDN) is used as a decoder of the latent image features, and a deep Convolutional Neural Network (CNN) is used as an image encoder; the CNN is used to approximate a distribution for the latent DGDN features/code. The latent code is also linked to generative models for labels (Bayesian support vector machine) or captions (recurrent neural network). When predicting a label/caption for a new image at test, averaging is performed across the distribution of latent codes; this is computationally efficient as a consequence of the learned CNN-based encoder. Since the framework is capable of modeling the image in the presence/absence of associated labels/captions, a new semi-supervised setting is manifested for CNN learning with images; the framework even allows unsupervised CNN learning, based on images alone.
منابع مشابه
Variational Autoencoder for Deep Learning of Images, Labels and Captions: Supplementary Material
Table 1: Semi-supervised classification accuracy (%) on the validation set of ImageNet 2012. Proportion 1% 5% 10% 20% 30% 40% top-1 AlexNet 0.1± 0.01 11.5 ± 0.72 19.8 ± 0.71 38.6 ± 0.31 43.23 ± 0.28 45.85 ± 0.23 GoogeLeNet 4.75± 0.58 22.13± 1.14 32.18± 0.80 42.83± 0.28 49.61± 0.11 51.90 ± 0.20 BSVM (ours) 43.98± 1.15 47.36± 0.91 48.41± 0.76 51.51± 0.28 54.14± 0.12 57.34± 0.18 Softmax (ours) 42....
متن کاملCombining sequential deep learning and variational Bayes for semi-supervised inference
In the application of machine learning to time series, coarse labelling of the sequence is generally known. However, often a finer granularity of the annotations is sought for improved accuracy and resolution in signal analysis. With a focus on medical time series, this study employs a bidirectional LSTM autoencoder, t-SNE, and variational Bayesian estimation in order to provide labels with fin...
متن کاملGenerating Images with Perceptual Similarity Metrics based on Deep Networks
Image-generating machine learning models are typically trained with loss functions based on distance in the image space. This often leads to over-smoothed results. We propose a class of loss functions, which we call deep perceptual similarity metrics (DeePSiM), that mitigate this problem. Instead of computing distances in the image space, we compute distances between image features extracted by...
متن کاملGraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders
Deep learning on graphs has become a popular research topic with many applications. However, past work has concentrated on learning graph embedding tasks, which is in contrast with advances in generative models for images and text. Is it possible to transfer this progress to the domain of graphs? We propose to sidestep hurdles associated with linearization of such discrete structures by having ...
متن کاملGraphvae: towards Generation of Small Graphs Using Variational Autoencoders
Deep learning on graphs has become a popular research topic with many applications. However, past work has concentrated on learning graph embedding tasks only, which is in contrast with advances in generative models for images and text. Is it possible to transfer this progress to the domain of graphs? We propose to sidestep hurdles associated with linearization of such discrete structures by ha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016